Machine Learning Approach to Augmenting News Headline Generation
نویسندگان
چکیده
In this paper, we present the HybridTrim system which uses a machine learning technique to combine linguistic, statistical and positional information to identify topic labels for headlines in a text. We compare our system with the Topiary system which, in contrast, uses a statistical learning approach to finding topic descriptors for headlines. The Topiary system, developed at the University of Maryland with BBN, was the top performing headline generation system at DUC 2004. Topiary-style headlines consist of a number of general topic labels followed by a compressed version of the lead sentence of a news story. The Topiary system uses a statistical learning approach to finding topic labels. The performance of these systems is evaluated using the ROUGE evaluation suite on the DUC 2004 news stories collection.
منابع مشابه
LexTrim: A Lexical Cohesion Based Approach to Parse-and-Trim Style Headline Generation
In this paper we compare two parse-and-trim style headline generation systems. The Topiary system uses a statistical learning approach to finding topic labels for headlines, while our approach, the LexTrim system, identifies key summary words by analysing the lexical cohesion structure of a text. The performance of these systems is evaluated using the ROUGE evaluation suite on the DUC 2004 news...
متن کاملComparing Topiary-Style Approaches to Headline Generation
In this paper we compare a number of Topiary-style headline generation systems. The Topiary system, developed at the University of Maryland with BBN, was the top performing headline generation system at DUC 2004. Topiary-style headlines consist of a number of general topic labels followed by a compressed version of the lead sentence of a news story. The Topiary system uses a statistical learnin...
متن کاملPredicting News Values from Headline Text and Emotions
We present a preliminary study on predicting news values from headline text and emotions. We perform a multivariate analysis on a dataset manually annotated with news values and emotions, discovering interesting correlations among them. We then train two competitive machine learning models – an SVM and a CNN – to predict news values from headline text and emotions as features. We find that, whi...
متن کاملHeadline Generation for Written and Broadcast News
This technical report is an overview of work done on Headline Generation for written and broadcast news. The report covers HMM Hedge, a statistical approach based on the noisy channel model, Hedge Trimmer, a parse-andtrim approach using linguistically motivated trimming rules, and Topiary, a combination of Trimmer and Unsupervised Topic Discovery. Automatic evaluation of summaries using ROUGE a...
متن کاملTranslation of News Headlines
Machine-Translation of news headlines is difficult since the sentences are fragmentary and abbreviations and acronyms of proper names are frequently used. Another difficulty is that, since the headline comes at the top of a news article, the context information useful to disambiguate the sense of words and to determine their translation(target word) is not available. This paper proposes a new a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005